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Bounds for Compression in Streaming Models 

Abstract. Compression algorithms and streaming algorithms are both 
powerful tools for dealing with massive data sets, but many of the best 
compression algorithms — e.g., those based on the Burrows- Wheeler 
Transform — at first seem incompatible with streaming. In this paper 
we consider several popular streaming models and ask in which, if any, 
we can compress as well as we can with the BWT. We first prove a nearly 
tight tradeoff between memory and redundancy for the Standard, Multi- 
pass and W-Streams models, demonstrating a bound that is achievable 
with the BWT but unachievable in those models. We then show we can 
compute the related Schindler Transform in the StreamSort model and 
the BWT in the Read- Write model and, thus, achieve that bound. 



The increasing size of data sets over the past decade has inspired work on both 
data compression and streaming algorithms. In compression research, the advent 
of the Burrows- Wheeler Transform 5 (BWT) has led to great improvements in 

Y^ ' both theory and practice. Streaming algorithms, meanwhile, are now used not 

only to process data online but also, because sequential access is so much faster 
than random access, to process them on disk more quickly. To combine these 

►^ , advances, it seems we must find a way to compute something like the BWT in 

QQ ' a streaming model, e.g. 

m ■ 

^f) . 1. Standard: In the simplest and most restrictive model, we are allowed only 

CO ' one pass over the input and memory sublinear (usually polylogarithmic) in 

the input's size (see, e.g., [13]). 
2. Multipass: In one of the earliest papers on what is now called streaming, 
l^ . Munro and Paterson 22j proposed a model in which the input is stored 

on a one-way, read-only tape — representing external memory — that is 
completely rewound whenever we reach the end. 
K^ \ 3. W-Streams: Ruhl [25] proposed a model in which we can also write: during 

j_j ■ each pass over the tape, we can replace its contents with something up to a 

C^ ' constant factor larger. 

4. StreamSort: Since we cannot sort in the W-Streams model with polyloga- 

rithmic memory and passes, Ruhl also proposed a generalization in which we 
can sort the contents of the tape at the cost of a constant number of passes 
(see also fT). 

5. Read- Write: Grohe and Schweikardt [13] noted that, with an additional 

tape, we can sort even in the W-Streams model; they proposed a model in 
which we have a read-write input tape, some number of read-write work 
tapes, and a write-only output tape. 
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In a previous paper [12j we proved nearly tight bounds on how well we can 
compress in the Standard model with constant memory. We also showed how 
LZ77 |28j can be implemented with a growing sliding window to use slightly 
sublinear memory without increasing the bound on its redundancy, but that 
this cannot be done for sublogarithmic memory. Those bounds are, however, 
tangential to the common assumption of polylogarithmic memory. In this paper 
we consider whether the following bound can be achieved in the models de- 
scribed above with polylogarithmic resources: Given a string s of length n over 
an alphabet of constant size a, can we store s in 0{nHk{s) + a^ logn) bits for 
all k simultaneously, where Hk{s) is the fcth-order empirical entropy of s? We 
first prove the bound unachievable in the first three models, via a nearly tight 
tradeoff between memory and redundancy. We then show we can compute the 
Schindler Transform 26J (ST) in the StreamSort model and the BWT in the 
Read- Write model, so the bound is achievable in them. 

We start by reviewing some preliminary material in Section [2l In Section [3] 
we show how, for any constants c and e with 1 > c > and e > 0, we can 
store s in nHk{s) + 0{(j^n}~^^'^) bits in the Standard model with 0{n'^) bits 
of memory; we then show this bound is nearly optimal in the sense that we 
cannot always store s in, nor recover it from, 0{nHk{s) + a^n}~'^~'^) bits. In 
Section [4] we extend our tradeoff to the Multipass and W-Streams models. Since 
we can store BWT(s) in, and recover it from, 3AnHk{s) + 0{a^) bits in even 
the Standard model, our tradeoff implies we can neither compute nor invert the 
BWT in the first three models. In Section [5] we show how we can compute the 
ST in the StreamSort model and in Section [S] we show how we can compute 
the BWT in the Read- Write model. Using either transform, we can store s in 
1.8nHk{s) + 0{a^ logn) bits in these models. 

2 Preliminaries 

Throughout this paper, we assume s = si • • • s„ is a string of length n over an 
alphabet of constant size a, and c and e are constants with 1 — e>c>e>0. 
The Oth-order empirical entropy Hq{s) of s is the entropy of the characters' 
distribution in s, i.e., i?o(s) — ^ J2aes ^a log ^ where a G s means character a 
occurs in s and Ua is its frequency. The fcth-order empirical entropy -fffc(s) of s 
for fc > 1, described in detail by Manzini [21], is defined as 



Hkis) = - V \ws\Ho{ws), 

n '—' 

U! — fc 



where Ws is the concatenation of characters that immediately follow occurrences 
of w in s. 

The BWT is an invertible transform that permutes the characters in s so 
that Si comes before Sj if Si-\ ■ ■ ■ Si_„ is lexicographically less than Sj-\ ■ ■ ■ Sj_„, 
taking indices modulo n -I- 1 and considering sq = Sn+i to be lexicographically 
less than any character in the alphabet. In other words, the BWT sorts s's 
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characters in order of their contexts, which start at their predecessors and ex- 
tend backwards. Although the distribution of characters remains the same — 
so i/o(BWT(s)) = Ho{s) — characters with similar contexts are moved close 
together so, if s is compressible with an algorithm that takes uses contexts, 
then BWT(s) is compressible with an algorithm that takes advantage of local 
homogeneity. Move-to-front f3< and distance coding [4|8j are two reversible trans- 
forms that turn strings with local homogeneity into strings of numbers with low 
Oth-order empirical entropy: move-to-front keeps a list of the characters in the 
alphabet and replaces each character in the input by its position in the list and 
then moves it to the front of the list; distance coding writes the distance to 
the first occurrence of each character and the length of the string, then replaces 
each character in the input by the distance from the last occurrence of that 
character, omitting Is. (The numbers are often written with, e.g., Elias' delta 
code [S], in which case move-to- front and distance coding become compression 
algorithms themselves.) Building on work by Kaplan, Landau and Verbin |16| , we 
showed in another previous paper |11| that composing the BWT, move-to-front, 
run-length coding and arithmetic coding produces an encoding that contains 
at most 3.4niJfc(s) -t- 0{a'') bits for all k simultaneously; with the BWT, dis- 
tance coding and arithmetic coding, the bound is 1.8nHk{s) + 0(cr''' log ri) bits. 
Using the BWT in more sophisticated ways, Ferragina, Manzini, Makinen and 
Navarro [T0| and Makinen and Navarro [20l achieved a bound on the encoding's 
length of nHk{s) + o{n/ log n) bits for all k at most a constant proper fraction of 
log^ n; Grossi, Gupta and Vitter [14] achieved a bound of nHk{s) + 0{a'' logn) 
bits for all k simultaneously, matching a lower bound due to Rissanen f24l. In 
Section [3] we show how, because length times empirical entropy is superadditive 
— i.e., \a\Hk{a) + \b\Hk{b) < \ab\Hk{ab) — we can trade-off between memory 
and redundancy by breaking s into blocks and encoding each block in turn with 
Grossi, Gupta and Vitter 's algorithm: the memory needed decreases by a fac- 
tor roughly equal to the number of blocks, while the redundancy increases by 
that much. In Section [5] we use the ST instead of the BWT. When using con- 
texts of length k, the ST permutes the characters of s so that Si comes before 
Sj if Si_i • • ■ Si^k is lexicographically less than Sj_i • • ■ Sj-t or, when they are 
equal, if i < j. By the same arguments as for the BWT, composing the ST, dis- 
tance coding and arithmetic coding produces an encoding the contains at most 
1.8nHk{s) + 0{(J^ logn) bits for the chosen context length k. 

A (T-ary De Bruijn sequence of order fc > 1 contains each possible fc-tuple 
exactly once and, it follows, has length a^ + k — 1 and starts and ends with 
the same k — 1 characters. Notice that, if d consists of the first a^ characters 
of such a sequence, then Hk{(P) — for any i. However, there are {(j\Y 
such sequences |6ll7|18j so, by Li and Vitanyi's Incompressibility Lemma [19], 
for any fixed algorithm A, the maximum Kolmogorov complexity of d relative 
to A is at least [cr'^"^ logcrlj = /2((t'^) bits. We used this fact in [T^] to prove 
that a one-pass algorithm cannot always compress well in terms of the fcth-order 
empirical without using memory exponential in k: we can compute d from the 
configurations of A's memory when it starts and finishes reading any copy of d 



in rf' and its output while reading that copy (if there were another string d! that 
took A between those configurations while producing that output, then we could 
substitute d! for that copy of d without changing the total encoding); therefore, 
if A uses o[a'') bits of memory, then its total output must be J^(|d*|) bits. 

3 The Standard model 

We can easily store s in n7Jo(s) + 2n + 0(l) bits in one pass with O(logn) bits of 
memory with dynamic Huffman coding [27], for example; by running a separate 
copy of the algorithm for each possible fc-tuple for any given k, we can store 
s in nHk{s) + 2n + 0(cr'^) bits in one pass with 0{a^\ogn) bits of memory. 
With adaptive arithmetic coding instead of dynamic Huffman coding, the 2n 
term becomes approximately n/100 (see, e.g., [E]). With the version of LZ77 
we described in [12], for any fixed k we can store s in nHk{s) + o(n) bits in 
one pass with o{n) bits of memory, but with both o{n) terms nearly linear. We 
now show we can substantially reduce those terms simultaneously, answering a 
question we posed in that paper. 

Theorem 3.1. In the Standard model with 0{n'^) bits of memory, we can store s 
in, and later recover it from, nHk{s) ~\- 0{a^ n}^^'^'^) hits for all k simultaneously. 

Proof. Let A be Grossi, Gupta and Vitter's algorithm. They proved A stores s in 
nHk{s) + 0((t'' logn) bits for all k simultaneously using 0{n) time so, although 
they did not give an explicit bound on the memory used, we know it is at most 
^'s time complexity multiplied by the word size, i.e., 0(n log n) bits. 

First, suppose we know n in advance. We process s in 0(71^^"^+'/^) blocks 
61, ... , hjrn each of length 0{n'^~'^''^): we read each block hi in turn, compute and 
output Aibi) — using 0(|5i| log |6.i|) — 0{n'^) bits of memory — and erase hi 
from memory. As we noted in Section [21 empirical entropy is superadditive, so 
the total length of the encodings we output is at most 

m 

J2 {\b^\Hk{b^) + Oia" log |6,|)) < ni/,(s) + 0{a''n^--+') 

i=\ 

bits for all k simultaneously. 

Now, suppose we do not know n in advance. We work as before but we start 
with a constant estimate of n and, each time we have read that many characters 
of s, we double it. This way, we increase the size of the largest block by less than 
2 and the number of blocks by an 0(logn)-factor, so our asymptotic bounds on 
the memory used and the whole encoding's length does not change. D 

Extending our lower bounds from that paper, we now show we cannot reduce 
the factor v}~'^^'^ much further unless we increase the factor cr'^, not even if we 
multiply the bound on the encoding's length by any constant coefficient. This 
lower bound holds for both compression and decompression; as we noted in [12] , 
"good bounds [for decompression] are equally important, because often data 
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is compressed once by a powerful machine (e.g., a server or base-station) and 
then transmitted to many weaker machines (clients or agents) who decompress 
it individually." 

Theorem 3.2. In the Standard model with 0{n'^) bits of memory, we cannot 
store s in 0{nHk{s) + a^n}''^"'^) hits in the worst case for, e.g., k = \{c + e/2) 
log,, n] . 

Proof. Consider any compression algorithm A that works in the Standard model 
with 0{n'^) bits of memory. Suppose s consists of copies of the first a^ characters 
d in a cr-ary De Bruijn sequence of order k = [(c+e/2) log^ n] whose Kolmogorov 
complexity relative to A is f2{a^) — n{n'^~^'^/^) bits. As we noted in Section [21 
we can compute d from the configurations of A's memory when it starts and 
finishes reading any copy of d and its output while reading that copy. Since A's 
memory is asymptotically smaller than d's Kolmogorov complexity relative to 
A, it must output i7(cr'^) bits for every copy of d in s, i.e., /2(n) bits altogether. 
Since Hkis) = 0, however, 0{nHk{s) + a^n^-"-^) ^ 0{n^-^/^). U 

Theorem 3.3. In the Standard model with 0{n'^) bits of memory, we cannot re- 
cover s from 0{nllk{s)+a''n^~''~'^) bits in the worst case for, e.g., k — [(c+e/2) 
log^ n] . 

Proof. Let A be a decompression algorithm that works in the Standard model 
with 0{n'^) bits of memory and let s be as in the proof of Theorem 13. 21 We can 
also compute d from the configuration of A's memory when it starts outputting 
any copy of d and the bits it reads while outputting that copy; since A's memory 
is asymptotically smaller than d's Kolmogorov complexity relative to A, it must 
read f2{a'') bits for each copy of d in s, i.e., /2(n) bits altogether, whereas 
0{nHk{s) + CT^n^-"-") = Oin^-^/"^). U 

Finally, we now prove that, if we could compute the BWT in the Standard 
model, then we could achieve the bounds we have just proven unachievable; we 
will draw the obvious conclusion in Section |4] as part of a more general theorem. 

Lemma 3.4. In the Standard model with O(logn) bits of memory, we can store 
BWT(s) in, and later recover it from, 3.4nHk{s) + 0{a ) hits for all k simulta- 
neously. 

Proof. We encode BWT(s) by composing move-to- front, run-length coding and 
adaptive arithmetic coding; since encoding or decoding each of the three steps 
takes one pass and O(logn) bits of memory, so does their composition. As we 
noted in Section[21 the resulting encoding contains at most 3.AnHk{s) -\- 0{a'') 
bits for all k simultaneously. D 

4 The Multipass and W-Streams models 

We now extend our tradeoff to the Multipass and W-Streams models. Of course, 
anything we can do in the Standard model we can do in those models, so the 
upper bounds are immediate. 



Theorem 4.1. In the Multipass and W-Streams models with 0{n'^) bits of mem- 
ory and one pass, we can store s in nHk{s) + 0{(7'^n^^'^''^'^) bits for all k simul- 
taneously. 

Conversely, lower bounds for the W-Streams model apply to the Multipass 
model. We could quite easily extend our proofs to include the Multipass model 
alone: e.g., to see we cannot compress s very well with polylogarithmic passes, 
notice we can compute d from the configurations of A's memory when it starts 
and finishes reading any copy of d and its output while reading that copy dur- 
ing each pass; since log^^' n = 0(71'/^), even a polylogarithmic number of ^'s 
memory configurations are asymptotically smaller than d's Kolmogorov com- 
plexity relative to A, so the rest of our argument still holds. The proof for the 
W-Streams model must be slightly different because, for example, after the first 
pass the tape will generally not contain copies of d. 

Theorem 4.2. In the Multipass and W-Streams models with 0{n'^) bits of mem- 
ory and log ^ ' n passes, we cannot store s in 0{nllk{s) + a n^^'^^'^) bits in the 
worst case for, e.g., k = \{c -\- e/2) log^,. n~\ . 

Proof. We need consider only the more general W-Streams model. Consider any 
compression algorithm A that works in the W-Streams model with 0{rf) bits 
of memory and log ^ ' n passes, and let s be as in the proof of Theorem 13.21 
Without loss of generality, assume A's output consists of the contents of the tape 
after its last pass (any output from intermediate passes can be written on the 
tape instead). Notice any substring of characters on the tape immediately after a 
particular pass must have been written (or left untouched) while A was reading 
a substring of characters on the tape immediately before that pass. Suppose A 
makes p passes over s. Consider any copy do of d in s and, for p > i > 1, let di be 
the substring A writes while reading di-i. We claim we can compute d from dp 
and the memory configurations of A when it starts and finishes reading each d^. 
To see why, suppose there were a sequence dg 7^ do, d'^, . . . , dp_i,d'p = dp such 
that, for p > i > 1, d[_i took A between the ith pair of memory configurations 
while writing d^; then we could substitute dg for do without changing the total 
encoding. It follows that dp must contain n{a^) bits and, so, the whole tape 
must contain fi{n) bits after the last pass, whereas 0{nHk{s) -\- a'^n^~'^~'^) = 
0(ni-^/2). D 

Theorem 4.3. In the Multipass and W-Streams models with 0{n'^) bits of mem- 
ory and log ^ ' n passes, we cannot recover s from 0{nllk{s) + a^n^^'^^'^) bits 
in the worst case for, e.g., k ~ [(c -f e/2) log^. n] . 

Proof. Again, we consider only the W-Streams model. Consider any decompres- 
sion algorithm A that works in the W-Streams model with 0{n'^) bits of memory 
and log '^^' n passes, and let s be as in the proof of Theorem 13.21 Without loss 
of generality, assume the tape contains s after A's last pass. Consider any copy 
do of d in s and, for p > i > 1, let di be the substring A reads while writing 
dj_i; notice this is the reverse of the definition in the proof of Theorem] 
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Notice we can compute d from dp and the memory configuration of A when it 
starts reading each di: running A on di, starting in the memory configuration, 
produces di-i. It follows that A must read ^{a'') bits for each copy of d in s, 
i.e., n{n) bits altogether, whereas 0{nHk{s) + a''n'^~''~'') = 0{n^"'^'^). D 

We note in passing that in the W-Streams model, we can easily reduce sorting 
the characters in s to computing the BWT: we compute s' — (si, so)(s2, si) • • • 
(s„,s„_i)(so,s„); we compute BWT(s') = (si, so)(sii+i, SiJ • ■ • (si„+i, Si„); and 
we output Sij , . . . , Si„ . To see why s^^ , . . . , Si^ are sorted, suppose (si+i, Si) pre- 
cedes {sj+i,Sj) in BWT(s'). The BWT arranges the pairs in s' in the lexico- 
graphic order of their predecessors (how it breaks ties does not concern us now), 
which is that of their predecessors' first components, or that of their own second 
components — so s^ < Sj . If a were unbounded, this reduction would imply we 
could not compute the BWT in the first three models; since a is constant, how- 
ever, it is meaningless — we can sort the characters in s in the Multipass model 
anyway. Fortunately, we can also easily reduce storing s in 0{nHk{s) +cr'^ logn) 
bits to computing the BWT. 

Theorem 4.4. In the Standard, Multipass and W-Streams models with log ^^' n 
hits of memory and passes, we can neither compute nor invert BWT(s) in the 
worst case. 

Proof. If we could compute or invert BWT(s) then, by Lemma 13.41 we could 
store s in, or recover it from, 3.4?T,7Jfe(s) -I- 0{a'') bits for all k simultaneously; 
however, by Theorems 13.21 [3731 14.21 and 14.31 we cannot achieve this bound in the 
worst case. D 

5 The StreamSort model 

The ST is known as both the "Schindler Transform" and the "Sort Transform", 
so it is perhaps not surprising that it can be computed in the StreamSort model. 
Indeed, computing the ST for any given k — O(logn) takes only a constant 
number of passes once we have padded the input from 0{n) bits to f2{nlogn) 
bits; we do this padding, which takes O(loglogn) passes, because we are allowed 
to expand the tape contents by a only constant factor during each pass, and 
we want to eventually associate each character with a 0(logn)-bit key — the 
fc-tuple that precedes that character in s — and then stably sort by the keys. 
Unfortunately, we do not know yet how to invert the ST in cither the StreamSort 
or Read- Write models. 

Lemma 5.1. For any given k — 0{logn), we can compute ST(s) in the Stream- 
Sort model with O(logn) bits of memory and O(loglogn) passes. 

Proof. We make 0(log log n) passes, each time doubling the length of each char- 
acter's representation by padding it with Os, until each character takes n{\ogn) 
bits. We make another pass to associate each character in s with the fc-tuple 
that precedes it; since loga*' — O(logn), we can use O(logn) bits of memory to 
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keep track of the last k characters we have seen and write them as a key in front 
of the next character while only doubling the number of bits on the tape. We 
then use 0(1) passes to stably sort by those keys — i.e., computing the ST — 
and, finally, delete the keys and padding. D 

Theorem 5.2. In the StreamSort model with O(logn) bits of memory and 
O(lognloglogn) passes, we can store s in 1.8nHk{s) + O(a'^logn) bits for all 
k simultaneously. 

Proof. We compute ST(s) in O(loglogri) passes and encode it in 0(1) passes 
by composing distance coding and adaptive arithmetic coding, for each k = 
O(logn); in total, we use O(logn) bits of memory and O(lognloglogn) passes. 
As we noted in Section [21 each resulting encoding contains at most 1.8nHk{s) + 
0{(j'' logn) bits for the value of k used to compute it; thus, the shortest encoding 
contains 1.8nHk{s) + 0{(t^ logn) for all k — O(logn) simultaneously and so — 
because 

1.8ni7o(s) + O(logn) < 1.8nHk{s) + (cr'^logn) = w(n) 

for all k = Loilogn) — for all k simultaneously. D 

6 The Read- Write model 

Anything we can do in the StreamSort model we can do in the Read- Write 
model using an 0(logn)-factor more passes, so Theorem 15.21 implies we can 
store s in 1.8nHk{s) + {a^ logn) bits for all k simultaneously in the Read- Write 
model with O(logn) bits of memory and 0(log n log logn) passes. It does not, 
however, imply we can recover s again in the Read- Write model. Fortunately, 
using techniques based on the doubling algorithm by Arge, Ferragina, Grossi 
and Vitter [2j for sorting strings in external memory (see also [T), we can both 
compute and invert BWT(s) in the Read- Write model. Figures [1] and [2] show 
how we compute and invert BWT(mississippi); to save space, Figure [2] shows 
two rounds of the algorithm in each row. 

Theorem 6.1. In the Read-Write model with O(logn) bits of memory and 
0(log n) passes, we can store s in, and later recover it from, l.8nHk{s) + 
0{g^ logn) hits for all k simultaneously. 

Proof. As we noted in Section [2l once we have BWT(s), we can store it in 
1.8nHk{s) -\- O(cr'^logn) bits for all k simultaneously by composing distance 
coding and adaptive arithmetic coding. To compute or invert distance coding 
and to encode or decode adaptive arithmetic coding all take O(logn) bits of 
memory and 0(1) passes. Therefore, we need consider only how to compute and 
invert BWT(s). Due to space constraints, here we only sketch these procedures; 
we will give full descriptions and analyses in the full paper. 

To compute BWT(s), we append a special character that is lexicographically 
less than any character in the alphabet (# in Figures [1] and [2]) . We tag each 
character in s with a unique identifier (e.g., its position; in this model, we can 
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Fig. 1. Computing the BWT in the Read- Write model. 
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Fig. 2. Inverting the BWT in the Read- Write model. 
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expand the contents of the tape by more than a constant factor during a pass, 
so we do not need to pad — ahhough O (log log n) extra rounds would not make 
a difference here anyway), then form a triple from it by appending a 1 and 
its successor in s. We make two copies of the set of triples, sort the first copy 
by the last component and the second copy by the first component (breaking 
ties by characters identifiers), and merge them to form quintuples. (It is the 
copying and sorting step that we do not sec how to do in the StreamSort model.) 
We sort the set of quintuples by the fourth component, breaking ties by the 
third component (ignoring characters' identifiers), breaking continued ties by the 
second component, and breaking continued ties arbitrarily. (In the first round, 
all the fourth and second components are 1, so we effectively sort by the third 
component; notice the third component is the first component's successor in s 
and the fifth component's predecessor in s, taking indices modulo n+l.) Finally, 
we replace the middle triples — the second, third and fourth components — with 
numbers, starting at one and incrementing whenever we find a triple different 
from the one before. This process results in another set of triples; if the second 
components are the numbers 1 through n + l, we stop; otherwise, we repeat the 
procedure from the point of copying the triples. 

Notice that, at the end of the first round, the first and third components 
in any triple are two positions apart in s, taking indices modulo n+l. Also, 
for any two triples {si,x, Si+2) and (sj,?/, Sj+2), the comparative relationship 
between x and y is the same as the lexicographic relationship between s^+i 
and Sj+i. At the end of the second round, the first and third components in 
any triple are four positions apart in s and, for any two triples (s^, x, Si+4) and 
(sj, y, 5^+4), the comparative relationship between x and y is the same as the 
lexicographic relationship between Si+3Si+2Si+i and Sj+3Sj+2Sj+i- To see why, 
notice the relationship between x and y depends, in decreasing order of priority, 
on the relationships between xi and j/i, Si+2 and Sj+2, and X2 and 2/2 in the quin- 
tuples {si,X2, Si+2, Xi, Si+4) and {sj, 2/2, Sj+2, 2/i, Sj+4); since these quintuples are 
formed by joining triples {si,X2,Si+2) and (si+2,a;i, Si+4) and {sj,y2,Sj+2) cre- 
ated during the first round, the comparative relationships between xi and yi 
and X2 and j/2 are the same as the lexicographic relationships between s^+a and 
Sj+3 and Si+i and Sj+i. After O(logn) rounds, when the second components are 
the numbers 1 through n, they indicate the lexicographic relationships of the 
prefixes of the third components, so the third components are BWT(s) — in our 
example in Figure [TJ BWT(s) = ms#spipissii. (We note that, if we sorted quin- 
tuples by the second component, using the third and fourth to break ties, then 
we would compute the suffix array of s.) We need only O(logn) bits of memory 
for this procedure; each sorting step takes O(logn) passes and each other step 
takes 0(1) passes, so we use 0(log^ n) passes altogether. 

To invert BWT(s), we again tag each character in BWT(s) with a unique 
identifier and form triples. This time, however, we prepend a ? to each character, 
then prepend the corresponding character in the stable sort of BWT(s); finally, 
in the triple whose first component is the special character not in the alphabet, 
we replace the ? by n -|- 1. As for computing BWT(s), we make two copies of 



Bounds for Compression in Streaming Models 11 

the set of triples, sort them and merge them. This time, for each triple, if the 
second component is not a ? or the third component is a ?, then we simply 
delete the third and fourth components; if the second component is a ? and the 
third component is, then we put the third component minus 1 in the second 
component, then delete the third and fourth components. This process results 
in another set of triples; if the second components are the numbers 1 through 
n + 1 in some order, we stop; otherwise, we repeat the procedure from the point 
of copying the triples. 

By the definition of the BWT, the ith character in the stable sort of BWT(s) 
is the predecessor in s of the ith character in BWT(s). Therefore, at the end of 
the first round, the first and third components in any triple are two positions 
apart in s, taking indices modulo n + 1; also, the triples that have s„ and Sn+i 
as their first components have n and n + 1 as their second components. At the 
end of the second round, the first and third components of any triple are 4 
positions apart in s and the triples that have s„_2, s„_i, s„ and s„+i as their 
first components have n — 2, n — 1, n and n + I as their second components. 
After O(logn) rounds, when the second components are the numbers from 1 
through n + 1 in some order, they indicate the positions in s of the triples' 
first components. Sorting by the second component and ignoring the special 
character, we can recover s. In our example in Figure [2l the triple that starts 
'm' then has a 1 as its second component; the triples that start 'i' then have 2, 
5, 8 and 11; the triples that start 'p' then have 9 and 10; and the triples that 
start 's' then have 3, 4, 6 and 7; thus, sorting them by the second component 
and outputting the first, we recover mississippi. Again, we need only O(logn) 
bits of memory for this procedure; each sorting step takes 0(log7i) passes and 
each other step takes 0(1) passes, so we use 0(log^ n) passes altogether. D 



Grohe and Schweikardt showed that, given a sequence of n numbers xi, . . . , a;„, 
each of 2 log n bits, we cannot generally sort them using o(log n) passes, 0{n^~'^) 
bits of memory and 0(1) tapes, for any positive constant e. We can easily obtain 
the same lower bound for the BWT, via the following reduction from sorting: 
given Xi, . . . ,a;„, using one pass, O(logn) bits of memory and two tapes, for 
1 < i < n and 1 < J < 2 logn, we replace the jth bit Xi[j] of Xi by Xi[j] 2 Xi i j, 
writing 2 as a single character, Xi in 21ogn bits, i in logn bits and j in loglogn+1 
bits; notice the resulting string is of length n(31ogn + log log n + 2). (For sim- 
plicity, we now consider each character's context as starting at its successor and 
extend forwards.) The only characters followed by 2s in this string are the bits 
at the beginning of replacement phrases so, if we perform the BWT of it, the last 
2n log n characters of the transformed string are the bits of a; i , . . . , x„ ; moreover , 
since the lexicographic order of equal-length binary strings is the same as their 
numeric order, the Xi[j]s will be arranged by XiS, with ties broken by the is (so 
if Xi = Xi' with i < i' , then every Xi[j] comes before every Xii[j']), and further 
ties broken by the js; thus, the last 2nlogn bits of the transformed string are 
Xi, . . . ,x„ in sorted order. 
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